WHAT IS CLAIMED IS: 

1 LA method for processing a matrix of elements in a processor, the 

2 method comprising steps of: 

3 loading a first subset of matrix elements from a first location; 

4 loading a second subset of matrix elements from a second location; 

5 storing a third subset of matrix elements in a first destination; and 

6 storing a fourth subset of matrix elements in a second destination, wherein 

7 the loading and storing steps result from a first instruction issue. 

1 2. The method for processing the matrix of elements in the processor 

2 as recited in claim 1, wherein n sub-instructions perform an n-by-n matrix transpose. 

1 3. The method for processing the matrix of elements in the processor 

2 as recited in claim 1, wherein the first loading step is performed with a first processing 

3 path and the second loading step is performed with a second processing path. 

1 4. The method for processing the matrix of elements in the processor 

2 as recited in claim 1, further comprising the steps of: 

3 loading a fifth subset of matrix elements from a fifth location; 

4 loading a sixth subset of matrix elements from a sixth location; 

5 storing a seventh subset of matrix elements in a third destination; and 

6 storing a eighth subset of matrix elements in a fourth destination. 

1 5. The method for processing the matrix of elements in the processor 

2 as recited in claim 4, wherein the loading and storing steps introduced in claim 4 result 

3 from a second instruction issue. 

1 6. The method for processing the matrix of elements in the processor 

2 as recited in claim 4, wherein each of the first through fourth destination include a matrix 

3 column. 

1 7. The method for processing the matrix of elements in the processor 

2 as recited in claim 1, wherein each of the first through fourth locations include a matrix 

3 row. 
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1 8. The method for processing the matrix of elements in the processor 

2 as recited in claim 1 , wherein the third and fourth subsets each comprise elements jfrom 

3 the first and second subsets. 

1 9. A processing core for transposing a matrix, comprising: 

2 a first source location comprising a first plurality of matrix elements; 

3 a second source register comprising a second pluraUty of matrix elements; 

4 a third source register comprising a third plurality of matrix elements; 

5 a fourth source register comprising a fourth plurality of matrix elements; 

6 a first destination register comprising a fifth plurality of matrix elements; 

7 a second destination register comprising a sixth plurality of matrix 

8 elements; 

9 a first processing path coupled to the first through fourth source registers 

1 0 and the first destination register; and 

1 1 a second processing path coupled to the first through fourth source 

12 registers and the second destination register. 

1 10. The processing core for transposing the matrix of claim 9, wherein: 

2 the first through fourth registers each include a plurality of source fields, 

3 and 

4 each source field includes a matrix element. 

1 11. The processing core for transposing the matrix of claim 9, wherein: 

2 the first and second destination registers each include a plurality of result 

3 fields, and 

4 each source field includes a matrix element. 

1 12. The processing core for transposing the matrix of claim 9, further 

2 comprising 

3 first and second instruction processors; and 

4 an exchange path between the first and second instruction processors. 

1 13. The processing core for transposing the matrix of claim 9, wherein 

2 the first processing path receives a first sub-instruction and the second processing path 

3 receives a second sub-instruction. 
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1 14. The processing core for transposing the matrix of claim 9, wherein 

2 each of the first through fourth source registers include a matrix row. 

1 15. The processing core for transposing the matrix of claim 9, wherein 

2 each of the first and second destination registers include a matrix column. 

1 16. The processing core for transposing the matrix of claim 9, wherein 

2 the first and second destination registers are addressed by a first and second sub- 

3 instructions which are included in a very long instruction word. 

1 17. A method for processing a matrix of elements, the method 

2 comprising steps of: 

3 loading a first instruction; 

4 loading a second instruction, wherein the first and second instructions 

5 address a first source register, second source register, third source register, fourth source 

6 register, first destination register and second destination register; 

7 loading a third instruction; 

8 loading a fourth instruction, wherein the third and fourth instructions 

9 address the first source register, the second source register, the third source register, the 

10 fourth source register, a third destination register and a fourth destination register; 

1 1 storing a first element of the first source register in the first destination 

12 register; and 

13 storing a fourth element of the first source register in the fourth destination 

14 register, wherein a plurality of the first through fourth elements comprise a same 

15 instruction issue. 

1 18. The method for processing the matrix of elements of claim 17, 

2 wherein the first and second instructions include a first operation code and the third and 

3 fourth instructions include a second operation code different firom the first operation code. 

1 19. The method for processing the matrix of elements of claim 17, 

2 wherein the first and second instructions include a first operation code and the third and 

3 fourth instructions include a second operation code different fi:om the first operation code. 
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1 20. The method for processing the matrix of elements of claim 17, 

2 wherein the first instruction is a sub-instruction in a very long instruction word. 
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